By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer.github.io
translated by 谷歌翻译
Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by "solipsistic" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios. Each scenario pairs a physical environment (a "substrate") with a reference set of co-players (a "background population"), to create a social situation with substantial interdependence between the individuals involved. For instance, some scenarios were inspired by institutional-economics-based accounts of natural resource management and public-good-provision dilemmas. Others were inspired by considerations from evolutionary biology, game theory, and artificial life. Melting Pot aims to cover a maximally diverse set of interdependencies and incentives. It includes the commonly-studied extreme cases of perfectly-competitive (zero-sum) motivations and perfectly-cooperative (shared-reward) motivations, but does not stop with them. As in real-life, a clear majority of scenarios in Melting Pot have mixed incentives. They are neither purely competitive nor purely cooperative and thus demand successful agents be able to navigate the resulting ambiguity. Here we describe Melting Pot 2.0, which revises and expands on Melting Pot. We also introduce support for scenarios with asymmetric roles, and explain how to integrate them into the evaluation protocol. This report also contains: (1) details of all substrates and scenarios; (2) a complete description of all baseline algorithms and results. Our intention is for it to serve as a reference for researchers using Melting Pot 2.0.
translated by 谷歌翻译
Using massive datasets to train large-scale models has emerged as a dominant approach for broad generalization in natural language and vision applications. In reinforcement learning, however, a key challenge is that available data of sequential decision making is often not annotated with actions - for example, videos of game-play are much more available than sequences of frames paired with their logged game controls. We propose to circumvent this challenge by combining large but sparsely-annotated datasets from a \emph{target} environment of interest with fully-annotated datasets from various other \emph{source} environments. Our method, Action Limited PreTraining (ALPT), leverages the generalization capabilities of inverse dynamics modelling (IDM) to label missing action data in the target environment. We show that utilizing even one additional environment dataset of labelled data during IDM pretraining gives rise to substantial improvements in generating action labels for unannotated sequences. We evaluate our method on benchmark game-playing environments and show that we can significantly improve game performance and generalization capability compared to other approaches, using annotated datasets equivalent to only $12$ minutes of gameplay. Highlighting the power of IDM, we show that these benefits remain even when target and source environments share no common actions.
translated by 谷歌翻译
最近的作品表明,如何将大语言模型(LLM)的推理能力应用于自然语言处理以外的领域,例如机器人的计划和互动。这些具体的问题要求代理商了解世界上许多语义方面:可用技能的曲目,这些技能如何影响世界以及对世界的变化如何映射回该语言。在体现环境中规划的LLMS不仅需要考虑要做什么技能,还需要考虑如何以及何时进行操作 - 答案随着时间的推移而变化,以响应代理商自己的选择。在这项工作中,我们调查了在这种体现的环境中使用的LLM在多大程度上可以推论通过自然语言提供的反馈来源,而无需任何其他培训。我们建议,通过利用环境反馈,LLM能够形成内部独白,使他们能够在机器人控制方案中进行更丰富的处理和计划。我们研究了各种反馈来源,例如成功检测,场景描述和人类互动。我们发现,闭环语言反馈显着改善了三个领域的高级指导完成,包括模拟和真实的桌面顶部重新排列任务以及现实世界中厨房环境中的长途移动操作任务。
translated by 谷歌翻译
深度学习在复杂的模式识别任务上表现出色,例如图像分类和对象识别。但是,它与需要非平凡推理的任务(例如算法计算)斗争。人类能够通过迭代推理来解决此类任务 - 花更多的时间思考更艰难的任务。但是,大多数现有的神经网络都表现出由神经网络体系结构控制的固定计算预算,从而阻止了对更艰难任务的其他计算处理。在这项工作中,我们为神经网络提供了一个新的迭代推理框架。我们训练神经网络以在所有输出上参数化能量景观,并实施迭代推理的每个步骤,作为能量最小化步骤,以找到最小的能量解决方案。通过将推理作为一个能量最小化问题,对于导致更复杂的能源景观的更严重的问题,我们可以通过运行更复杂的优化程序来调整我们的基本计算预算。我们从经验上说明,我们的迭代推理方法可以在图和连续域中解决更准确和可推广的算法推理任务。最后,我们说明我们的方法可以递归解决需要嵌套推理的算法问题
translated by 谷歌翻译
Dexerous的操作任意物体,对人类的一项基本的日常任务,对自治机器人系统来说是一个宏伟的挑战。虽然使用加强学习的数据驱动方法可以开发发现要控制单个对象的行为的专家政策,但它们通常表现出不良的概念。在这项工作中,我们显示现有加强学习算法学习的政策实际上可以是通用的,当结合多任务学习和良好的对象表示时。我们表明,单个通用政策可以在手上操纵超过100个几何不同的真实世界对象,并通过看不见的形状或尺寸来推广到新的物体。有趣的是,我们发现与对象点云表示的多任务学习不仅概括更好,但甚至优于训练的单一对象专家策略以及保持的测试对象。视频结果在https://huangwl18.github.io/geometry-dex
translated by 谷歌翻译
人类能够利用先前经验中提取的概念快速了解场景。这些概念是多种多样的,包括全局场景描述符,例如天气或照明,以及局部场景描述符,例如特定对象的颜色或大小。到目前为止,无人监督的概念发现侧重于建模全局场景级别或局部对象级别因素,但不是两者。在这项工作中,我们提出了Comet,它发现并代表概念作为单独的能量函数,使我们能够代表全局概念以及统一框架下的对象。彗星通过重新编译输入图像来发现能源功能,我们发现没有额外监督的情况下捕获独立因素。 COMET中的样品生成作为基础能量函数的优化过程,使我们能够使用置换和组成的概念生成图像。最后,在COMET中发现了视觉概念概括,使我们能够在单独的图像模式之间以及由在不同数据集上训练的彗星的单独实例发现的其他概念来构思概念。 https://energy-based-model.github.io/comet/可用的代码和数据。
translated by 谷歌翻译
We motivate Energy-Based Models (EBMs) as a promising model class for continual learning problems. Instead of tackling continual learning via the use of external memory, growing models, or regularization, EBMs change the underlying training objective to cause less interference with previously learned information. Our proposed version of EBMs for continual learning is simple, efficient, and outperforms baseline methods by a large margin on several benchmarks. Moreover, our proposed contrastive divergence-based training objective can be combined with other continual learning methods, resulting in substantial boosts in their performance. We further show that EBMs are adaptable to a more general continual learning setting where the data distribution changes without the notion of explicitly delineated tasks. These observations point towards EBMs as a useful building block for future continual learning methods.
translated by 谷歌翻译
我们介绍了$ \ Gamma $ -Model,一种具有无限概率的环境动态的预测模型。用$ \ gamma $ -models替换标准的单步模型导致程序中概括为基于模型的控制,包括模型卷展栏和基于模型的值估计。$ \ Gamma $ -Model,具有经常对时间差异学习的生成重新诠释的,是继任者表示的自然连续模拟和模型和基于模型的机制之间的混合。与价值函数一样,它包含有关长期未来的信息;与标准预测模型一样,它与任务奖励无关。我们将$ \ Gamma $ -Model实例化为生成的对抗网络和规范化流程,讨论其培训如何反映训练时间和测试时间复合错误之间的不可避免的权衡,并经验证明其效用进行预测和控制。
translated by 谷歌翻译
This report summarizes the work carried out by the authors during the Twelfth Montreal Industrial Problem Solving Workshop, held at Universit\'e de Montr\'eal in August 2022. The team tackled a problem submitted by CBC/Radio-Canada on the theme of Automatic Text Simplification (ATS).
translated by 谷歌翻译